home *** CD-ROM | disk | FTP | other *** search
Text File | 1995-08-29 | 90.3 KB | 1,879 lines |
- MPEG-2 Frequently Asked Questions List
- Copyright 1994 by the MPEG Software Simulation Group
- Draft 3.4 (June 18, 1994)
-
- (This is a freely available publication owned by a private group. It
- may not be resold or published for personal or orgranizational profit--
- be that financial or vanity--under any other name)
-
- Authors: Chad Fogg (cfogg@netcom.com), ..
-
- 1. MPEG is a DCT based scheme, right?
- 2. What does the MPEG video syntax feature that codes video efficiently?
- 3. What does the syntax provide for error robustness?
- 4. What is the significance of each layer in MPEG video ?
- 5. How does the syntax facilitate parallelism?
- 6. I hear the encoder is not part of the standard?
- 7. Are some encoders better than others?
- 8. Can MPEG-1 encode higher sample rates than 352 x 240 x 30 Hz ?
- 9. What are Constrained Parameters Bitstreams (CPB) for video?
- 10. Why is Constrained Parameters so important?
- 11. Who uses constrained parameters bitstreams?
- 12. Are there ways of circumventing constrained parameters bitstreams for SIF
- class applications and decoders ?
- 13. Are there any other conformance points like CPB for MPEG-1?
- 14. What frame rates are permitted in MPEG?
- 15. Special prediction switches for MPEG-2
- 16. What is MPEG-2 Video Main Profile and Main Level?
- 17. Does anybody actually use the scalability modes?
- 18. What's the difference between Field and Frame pictures?
- 19. What do B-pictures buy you?
- 20. Why do some people hate B-frames?
- 21. Why was the 16x16 area chosen?
- 22. Why was the 8x8 DCT size chosen?
- 23. What is motion compensated prediction, and why is it a pain?
- 24. What are the various prediction modes in MPEG-2?
- 24.1 Frame:
- 24.2 Field predictions in frame-coded pictures:
- 24.3 Field predictions in field-coded pictures:
- 24.4 16x8 predictions in field-coded pictures:
- 24.5 Dual Prime prediction in frame and field-coded pictures
- 24.6 Field and frame organized macroblocks:
- 25. How do you tell a MPEG-1 bitstream from a MPEG-2 bitstream?
- 26. What is the reasoning behind MPEG syntax symbols?
- 27. Why bother to research compressed video when there is a standard?
- 28. Where can I get a copy of the latest MPEG-2 draft?
- 29. What are the latest working drafts of MPEG-2 ?
- 30. What is the latest version of the MPEG-1 documents?
- 31. What is the evolution of ISO standard documents?
- 32. Where is a good introductory paper to MPEG?
- 33. What are some journals on related MPEG topics ?
- 34. Is there a book on MPEG video?
- 35. Is it MPEG-2 (Arabic numbers) or MPEG-II (roman)?
- 36. What happened to MPEG-3?
- 37. What is MPEG-4?
- 38. What are the scaleable modes of MPEG-2?
- 39. Why MPEG-2? Wasn't MPEG-1 enough?
- 40. What did MPEG-2 add to MPEG-1 in terms of syntax/algorithms ?
- 41. How do MPEG and JPEG differ?
- 42. How do MPEG and H.261 differ?
- 43. Is H.261 the de facto teleconferencing standard?
- 44. What is the TM rate control and adaptive quantization technique ?
- 45. How does the TM work?
- 46. What is a good motion estimation method, then?
- 47. Is exhaustive search "optimal" ?
- 48. What are some advanced encoding methods?
- 49. Is so-and-so really MPEG compliant ?
- 50. What are the tell-tale MPEG artifacts?
- 51. Where are the weak points of MPEG video ?
- 52. What are some myths about MPEG?
- 53. What is the color space of MPEG?
- 54. Don't you mean 4:1:1 ?
- 55. Why did MPEG choose 4:2:0 ? Isnt 4:2:2 the standard for TV?
- 56. What is the precision of MPEG samples?
- 57. What is all the fuss with cositing of chroma components?
- 58. How would you explain MPEG to the data compression expert?
- 59. How does MPEG video really compare to TV, VHS, laserdisc ?
- 60. What are the typical MPEG-2 bitrates and picture quality?
- 61. At what bitrates is MPEG-2 video optimal?
- 62. Why does film perform so well with MPEG ?
- 63. What is the best compression ratio for MPEG ?
- 64. Can MPEG be used to code still frames?
- 65. Is there an MPEG file format?
- 66. What are some pre-processing enhancements ?
- 67. Why use these "advanced" pre-filtering techniques?
- 68. What about post-processing enhancements?
- 69. Can motion vectors be used to measure object velocity?
- 70. How do you code interlaced video with MPEG-1 syntax?
- 71. Is MPEG patented?
- 72. How many cable box alliances are there?
- 73. Will there be an MPEG video tape format?
- 74. Where will be see MPEG in everyday life?
- 75. What is the best compression ratio for MPEG ?
- 76. Is there a MPEG CD-ROM format?
-
- 1. MPEG is a DCT based scheme, right?
-
- A. The DCT and Huffman algorithms receive the most press coverage (e.g.
- "MPEG is a DCT based scheme with Huffman coding"), but are in fact
- less significant when compared to the variety of coding modes
- signaled to the decoder as context-dependent side information.
-
- 2. What does the MPEG video syntax feature that codes video efficiently?
-
- A. Here are some examples of statistical conditions and how they are
- addressed by their syntax counterparts:
-
- 1. Occlusion: forwards or backwards temporal prediction in B pictures.
-
- 2. Smooth optical flow fields: variable length coding of 1-D prediction
- errors of motion vectors.
-
- 3. Interblock spatial correlation: 1-D prediction of DC coefficients in
- contiguous group of intra-coded macroblocks.
-
- 4. temporal correlation: variable on/off coding of prediction error
- at the macroblock (no coded block pattern macroblock type) or
- individual block (63-state coded block pattern) layer.
-
- 5. Temporal de-correlation: forward, backwards, or bidirectional
- prediction.
-
- 6. Content dependent quality: macroblock quantization_scale (a.k.a. mquant)
-
- 7. Sub-sample temporal prediction accuracy: bi-linearly interpolated
- (filtered) "half-pel" block predictions.
-
- 8. Local spatial correlations: 8x8 2-D DCT.
-
- 9. Local adaptivity: regular non-overlapping grid of blocks and
- macroblocks.
-
- 10. Limited motion activity: skipped macroblocks in P pictures.
-
- 11. Co-planar motion: skipped macroblocks in B pictures.
-
- 12. Human Visual response to spatial frequencies: lossy scalar
- quantization of DCT coefficients.
-
-
- 3. How does the syntax provide robustness against errors ?
-
- There is a limited amount of redundancy in the MPEG side-information
- stream which permits resynchronization:
-
- 1. Byte-aligned start codes in the coded bitstream.
- 2. End of block codes in coded blocks.
- 3. Slices.
- 4. slice_vertical_position embedded as sub-field within slice start codes.
- 5. slices commencing at regular locations in picture (MPEG-2)
-
- 4. What is the significance of each layer in MPEG video ?
-
- Sequence:
- Set of pictures sharing same sampling dimensions, bit rate,
- chromaticy (MPEG-1), quantization matrices (MPEG-1 only).
-
- Group of Pictures:
- Random access point giving SMPTE time code within sequence.
- Guaranteed to start with an I picture.
-
- Picture:
- Samples of a common plane -- "captured" from the same time instant.
- New set of quantization matrices (MPEG-2 only)
-
- Slice:
- Error resynchronization unit of macroblocks.
- At the commencement of a slice, all inter-macroblock coding
- dependencies are reset. Likewise, all macroblocks within a common
- slice can be dependently coded.
-
- Macroblock:
- Least common multiple of Y, Cb, Cr 8x8 blocks in 4:2:0 sampling
- structure. For MPEG-1, the smallest granularity of temporal
- prediction.
-
- Block:
- Smallest granularity of spatial coding information.
-
-
- 5. How does the syntax facilitate parallelism?
-
- A. For MPEG-1, slices may consist of an arbitrary number of
- macroblocks. They can be independently decoded once the picture
- header side information is known. For parallelism below the slice
- level, the coded bitstream must first be mapped into fixed-length
- elements. Further, since macroblocks have coding dependencies on
- previous macroblocks within the same slice, the data hierarchy must be
- pre-processed down to the layer of DC DCT coefficients. After this,
- blocks may be independently inverse transformed and quantized,
- temporally predicted, and reconstructed to buffer memory. Parallelism
- is usually more of a concern for encoders. In many encoders today,
- block matching (motion estimation) and some rate control stages (such
- as activity and/or complexity measures) are processed for macroblocks
- independently. Finally, with the exception that all macroblock rows
- in Main Profile MPEG-2 bitstreams must contain at least one slice, an
- encoder has the freedom to choose the slice structure.
-
- 6. I hear the encoder is not part of the standard?
-
- A. The encoder rests just outside the normative scope of the standard,
- as long as the bitstreams it produces are compliant. The decoder,
- however, is almost deterministic: a given bitstream should
- reconstruct to a unique set of pictures. However, since the IDCT
- function is the ONLY non-normative stage in the decoder, an
- occasional error of a Least Significant Bit is permitted. The designer
- is free to choose among many DCT algorithms and implementations. The
- IEEE 1180 test referenced in Annex A of the MPEG-1 (ISO/IEC 11172-2) and
- MPEG-2 (ISO/IEC 13818-2) Video specifications spells out the
- statistical mismatch tolerance between the Reference IDCT, which
- is a separable 8x1 "Direct Matrix" DCT implemented with 64-bit floating
- point accuracy, and the IDCT you are testing for compliance.
-
- 7. Are some encoders better than others?
-
- A. Definately. For example, the motion estimation search range of a has
- great influence over final picture quality. At a certain point a very
- large range can actually become detrimental (it may encourage large
- differential motion vectors). Practical ranges are usually between
- +/- 15 and +/- 32. As the range doubles, for instance, the search area
- quadruples.
-
- Rate control marks a second tell-tale area where some encoders perform
- significantly better than others.
-
- And finally, the degree of "pre-processing" (now a popular buzzword in the
- business) signals that the encoder belongs to an elite marketing class.
-
- 8. Can MPEG-1 encode higher sample rates than 352 x 240 x 30 Hz ?
-
- A. Yes. The MPEG-1 syntax permits sampling dimensions as high as 4095 x
- 4095 x 60 frames per second. The MPEG most people think of as "MPEG-1"
- is really a kind of subset known as Constrained Parameters bitstream (CPB).
-
- 9. What are Constrained Parameters Bitstreams (CPB) for video?
-
- A. MPEG-1 CPB are a limited set of sampling and bitrate parameters
- designed to normalize decoder computational complexity, buffer size, and
- memory bandwidth while still addressing the widest possible range of
- applications. The parameter limits were intentionally designed so that a
- decoder implementation would need only 4 Megabits of DRAM.
-
- Parameter Limit
- -------------- ---------------------------
- pixels/line 704
- lines/picture 480 or 576
- pixels*lines 352*240 or 352*288
- picture rate 30 Hz
- bit rate 1.862million bits/sec
- buffer size 40 Kilobytes (327,680 bits)
-
- The sampling limits of CPB are bounded at the ever popular SIF rate:
- 396 macroblocks (101,376 pixels) per picture if the picture rate is
- less than or equal to 25 Hz, and 330 macroblocks (84,480 pixels) per
- picture if the picture rate is 30 Hz. The MPEG nomenclature loosely
- defines a "pixel" or "pel" as a unit vector containing a complete
- luminance sample and one fractional (0.25 in 4:2:0 format) sample from
- each of the two chrominance (Cb and Cr) channels. Thus, the
- corresponding bandwidth figure can be computed as:
-
- 352 samples/line x 240 lines/picture x 30 pictures/sec x 1.5 samples/pixel
-
- or 3.8 Ms/s (million samples/sec) including chroma, but not including
- blanking intervals. Since most decoders are capable of sustaining
- VLC decoding at a faster rate than 1.8 Mbit/sec, the coded video bitrate
- has become the most often waived parameter of CPB. An encoder which
- intelligently employs the syntax tools should achieve SIF quality saturation
- at about 2 Mbit/sec, whereas an encoder producing streams containing
- only I (Intra) pictures might require as much as 4 Mbit/sec to achieve the
- same video quality.
-
- 10. Why is Constrained Parameters so important?
-
- A. It is an optimum point that allows (just barely) cost effective
- VLSI implementations in 1992 technology (0.8 microns). It also
- implies a nominal guarantee of interoperability for decoders and
- encoders. Since CPB is the most popular canonical conformance point,
- MPEG devices which are not capable of at least meeting SIF rates are
- usually not considered to be true MPEG.
-
- Picutre buffers (i.e. "frame stores") and coded data buffering
- requirements for MPEG-1 CPB fit just snugly into 4 Mbit of memory (DRAM).
-
- 11. Who uses constrained parameters bitstreams?
-
- A. Principal CPB applications are Compact Disc video (White Book or CD-I) and
- desktop video. Set-top TV decoders fall into a higher sampling rate
- category known as "CCIR 601" or "Broadcast rate," which as a rule of
- thumb, has sampling dimensions and bandwidth 4 times that of SIF or
- CPB.
-
-
- 12. Are there ways of circumventing constrained parameters bitstreams for SIF
- class applications and decoders ?
-
- A. Yes, some. Remember that CPB limits pictures by macroblock count.
- 416 x 240 x 24 Hz sampling rates are still within the constraints, but this
- would only be of benefit in NTSC (240 lines/field) displays. Deviating from
- 352 samples/line could throw off many decoder implementations which possess
- limited horizontal sample rate conversion abilities. Some decoders do in fact
- include a few rate conversion modes, with a filter usually implemented via
- binary taps (shifts and adds). Likewise, the target sample rates are usually
- limited or ratios (e.g. 640, 540, 480 pixels/line, etc.). Future MPEG
- decoders will likely include on-chip arbitrary sample rate converters,
- perhaps capable of operating in the vertical direction (although there is
- little need of this in applications using standard TV monitors, with the
- possible exception of windowing in cable box graphical user interfaces).
-
- Also, many CD videos are letterboxed at the 16:9 aspect ratio. The
- actual coded and display sampling dimensions are 384 x 216 (note
- 384/216 = 16/9). These videos are typically movies and are therefore coded
- at 24 frames/sec.
-
- 13. Are there any other conformance points like CPB for MPEG-1?
- A. Undocumented ones, yes. A second generation of decoder chips
- emerged on the market about 1 year after the first wave of
- SIF-class decoders. Both LSI Logic and SGS-Thomson introduced CCIR
- 601 class MPEG-1 video decoders to fill in the gap between canonical
- MPEG-1 (SIF) and the emergence of Main Profile at Main Level (CCIR
- 601) MPEG-2 decoders. Under non-disclosure agreement, C-Cube had the
- CL-950, although since Q2'94, the CL-9100 is now the full MPEG-2
- successor in production.
-
- 14. What frame rates are permitted in MPEG?
- A. A limited set is available for the choosing in MPEG-1, although "tricks"
- could be played with Systems-layer Time Stamps to convey non-standard picture
- rates. The set is: 23.976 Hz (3-2 pulldown NTSC), 24 Hz (Film),
- 25 Hz (PAL/SECAM or 625/60 video), 29.97 (NTSC), 30 Hz (drop-frame NTSC
- or component 525/60), 50 Hz (double-rate PAL), 59.97 Hz (double rate NTSC),
- and 60 Hz (double-rate drop-frame NTSC/component 525/60 video).
-
- Only 23.976, 24, 25, 29.97, and 30 Hz are within the conformance space of
- of Constrained Parameter Bitstreams and Main Level.
-
- 15. How are various levels of interlace and progressive prediction modes
- organized in MPEG-2 ?
-
-
- MPEG-2 sequence
- / \
- progressive interlaced sequence
- sequence / \
- Field picture Frame picture
- / \
- Frame or field pred. Frame MB prediction only
- / \
- Field dct Frame dct
-
-
- 16. What is MPEG-2 Video Main Profile and Main Level?
-
- A. MPEG-2 Video Main Profile and Main Level is analogous to MPEG-1's CPB,with
- sampling limits at CCIR 601 parameters (720x480x30 Hz or 720x576x24
- Hz). "Profiles" limit syntax (i.e. algorithms), whereas "Levels"
- limit coding parameters (sample rates, frame dimensions, coded
- bitrates, etc.). Together, Video Main Profile and Main Level
- (abbreviated as MP@ML) normalize complexity within feasible limits of
- 1994 VLSI technology (0.5 micron), yet still meet the needs of the
- majority of application users. MP@ML is the conformance point for most
- cable and satellite TV systems, for example.
-
- Profiles
- ======
- Simple: I and P pictures only. 4:2:0 sampling ratio. 8,9, or 10 bits DC
- precision. Originally intended for use in cable TV applications
- not wanting to spend only 8 Mbits of DRAM.
-
- Main: I, P, and B pictures. Dual Prime with no B-pictures only. 4:2:0
- sampling ratio. 8, 9, or 10 bits sample precision.
-
- SNR: scalable coding.
-
- Spatial: scalable coding.
-
- High: 8,9,10, or 11 bits sample precision. 4:2:2 and 4:4:4 sampling
- ratio.
-
-
- Level
- ====
- Simple: SIF video rate (3.041280 Mhz), 4 Mbit/sec, 0.489472 Mbit
- VBV buffer, 64 vertical in frame, 32 Vertical in field,
- 1:7 fcode hor.
-
- Main: CCIR 601 video rate (10.368 Mhz), 15 Mbit/sec, 1.835008 Mbit VBV
- buffer, 128 V in frame, 64 V in field, 1:8 f_code Hor.
-
- High 1440: 1440 x 1152 x 30 Hz (47.0016 Mhz), 60 Mbit/sec. 7.340032 Mbit
- VBV buffer, 128 V in Fe, 1:9 fcode H.
-
- High: 1920 x 1152 x 30 Hz (62.6688 Mhz), 80 Mbit/sec. 9.787392 Mbit VBV
- buffer. 1:9 fcode H
-
- 17. Does anybody actually use the scalability modes?
-
- A. At this time, scalability has found itself a limited number of
- applications, although research is definitely underway for its use in
- HDTV. Experiments have been demonstrated in Europe where, for example,
- PAL-rate video (720 x 576 x 25 fps) is embedded in the same stream as
- HDTV rate video (1440 x 1152 x 25 fps). The Nov. 1992 VADIS experiment
- divided the base layer (PAL) and enhancement into 4 and 16 Mbit/sec
- channels, respectively. The U.S. Grand Alliance favors HDTV
- simulcasting (separate NTSC analog and digital MPEG-2 HDTV
- broadcasts). Temporal scalability is the pet scalability mode as the
- possible future solution for coding 60 Hz progressive sequences while
- maintaining backwards compatibility with early-wave equipment (e.g.
- 1920 x 1080 x 30 Hz displays) . To elaborate, the first wave receivers
- of the late 1990s are expected to have decoders limited to 60
- field/sec interlaced and 30 frame/second progressive reconstruction
- rates. With temporal scalability applied, 60 Hz interlaced fields would
- be coded into a 16 Mbit/sec stream starting around 1996, and later an
- 8 Mbit/sec enhancement layer would be similcasted, effectively
- containing the coded "high pass" between 60 Hz progressive and 60 Hz
- interlaced. Several corporate mouths have been known to water at the
- mention of charging the quality-conscious subscriber an extra fee for
- the enhancement layer.
-
- 18. What's the difference between Field and Frame pictures?
-
- A. A frame-coded picture consists of interleaved lines from both the even
- and odd display fields. A frame picture is coded in progressive order (an
- even line, then an odd line, etc.) and in the case of MPEG-2, may
- optionally switch between field and frame coding on a macroblock basis.
- The Display Process, which is *almost* completely outside the scope of the
- MPEG specification, can chose to re-interlace the picture by displaying the
- odd and even lines at different times (16 milliseconds apart for 60 Hz
- displays). In fact, most pictures, regardless of whether they were coded
- as a Field or Frame, end up being displayed interlaced due to the fact
- that most TV displays are interlaced.
-
- 19. What do B-pictures buy you?
-
- A. Since bi-directional macroblock predictions are an average of two
- macroblock areas, noise is reduced at low bit rates (like a 3-D filter, if
- you will). At nominal MPEG-1 video (352 x 240 x 30, 1.15 Mbit/sec) rates, it
- is said that B-frames improves SNR by as much as 2 dB. (0.5 dB gain is
- usually considered worth-while in MPEG). However, at higher bit rates, B-
- frames become less useful since they inherently do not contribute to the
- progressive refinement of an image sequence (i.e. not used as prediction by
- subsequent coded frames). Regardless, B-frames are still politically
- controversial.
-
- B pictures are interpolative in two ways: 1. predictions in the bi-
- directional macroblocks are an average from block areas of two
- pictures 2. B pictures "fill in" or interpolate the 3-D video signal
- over a 33 or 25 millisecond picture period without contributing to the
- overall signal quality beyond that immediate point in time. In other
- words, a B picture, regardless of its internal make-up of macroblock
- types, has a life limited to its immediate self. As mentioned before,
- its energy does not propagate into other frames. In a sense, bits
- spent on B pictures are wasted.
-
-
- 20. Why do some people hate B-frames?
-
- A. Computational complexity, bandwidth, delay, and picture buffer size are
- the four B-frame Pet Peeves. Computational complexity in the decoder is
- increased since some macroblock modes require averaging between two
- block predictions.
-
- Worst case, memory bandwidth is increased an extra 15.2 MByte/s
- (single channel 4:2:0 601 rates, not including any half pel or
- page-mode overhead) for this extra directional prediction. An extra
- picture buffer is needed to store the future prediction reference
- (backwards prediction). Finally, an extra picture delay is introduced
- in the decoder since the frame used for backwards prediction needs to be
- transmitted to the decoder and reconstructed before the intermediate
- B-pictures can be decoded and displayed.
-
- Cable television (e.g. -- more like i.e.-- General Instruments) have
- been particularly adverse to B-frames since, for CCIR 601 rate video,
- the extra picture buffer pushes the decoder DRAM memory requirements
- past the magic 8- Mbit (1 Mbyte) threshold into the evil realm of 16
- Mbits (2 Mbyte).... although 8-Mbits is fine for 352 x 480 B picture
- sequence. However, cable often forgets that DRAM does not come in
- convenient high-volume (low cost) 8- Mbit packages as does friendly
- 4-Mbit and 16-Mbit packages. In a few years, the cost difference
- between 16 Mbit and 8 Mbit will become insignificant compared to the
- bandwidth savings gain through higher compression. For the time
- being, some cable boxes will start with 8-Mbit and allow future
- drop-in upgrades to the full 16-Mbit.
-
- 21. Why was the 16x16 area chosen?
-
- A. The 16x16 area corresponds to the Least Common Multiple (LCM) of 8x8
- blocks, given the normative 4:2:0 chroma ratio. Starting with medium
- size images, the 16x16 area provides a good balance between side
- information overhead & complexity and motion compensated prediction
- accuracy. In gist, 16x16 seemed like a good trade-off.
-
- 22. Why was the 8x8 DCT size chosen?
- A. Experiments showed little compaction gains could be acheived with larger
- sizes, especially when considering the increased implementation
- complexity. A fast DCT algorithm will require roughly double the
- arithmetic operations per sample when the linear transform point
- size is doubled. Naturally, the best compaction efficiency has been
- demonstrated using locally adaptive block sizes (e.g. 16x16, 16x8,
- 8x8, 8x4, and 4x4) [See Gary Sullivan and Rich Baker "Efficient Quadtree
- Coding of Images and Video," ICASSP 91, pp 2661-2664.].
-
- Inevitably, this introduces additional side information overhead and
- forces the decoder to implement programmable or hardwired recursive
- DCT algorithms. If the DCT size becomes too large, then more edges
- (local discontinuities) and the like become absorbed into the
- transform block, resulting in wider propagation of Gibbs (ringing) and
- other phenomena. Finally, with larger transform sizes, the DC term is
- even more critically sensitive to quantization noise.
-
-
- 23. What is motion compensated prediction, and why is it a pain?
-
- A. MCP in the decoder can be thought of as having four stages:
-
- 1. Motion vector computation
- 2. Prediction retrieval
- various predictions are 16x16, 16x8, 8x4, 8x8 plus any half-pel
- overhead (e.g. 17x16, 17x17, etc).
- 3. Filtering
- 3.1 Forming half-pel predictions through bi-linear interpolation.
- 3.2 Averaging two predictions together (B macroblocks, Dual Prime)
- 4. Combination and ordering
- 4.1 combining 1 or 2 predictions from stage 3. into upper and
- lower halves of the macroblock (e.g.16 x 8, field in frame)
- 4.2 interleaving or grouping together odd and even lines in frame
- coded pictures (dct_type).
-
- The final, combined prediction that is sent to the frame buffer is always
- a 16x16 block of luminance and and 8x8 block of chrominance, just like we
- experience in MPEG-1.
-
- A single motion vector can be associated with each block prediction source,
- hence a macroblock can have as many as 4 motion vectors (each in turn having
- a horizontal and vertical element).
-
- 24. What are the various prediction modes in MPEG-2?
-
-
- MB Prediction mode filtered pixel Count for mono-directional prediction
- block size a. Blocks b. Motion vectors
- Frame pictures:
- Frame 16x16 1 1
- Field 16x8 2 2
- Dual Prime 16x8 2 2 + dmv
-
- Field pictures:
- Field 16x16 1 1
- 16x8 16x8 2 2
- Dual Prime 16x16 2 1 + dmv
-
-
- 24.1 Frame:
- Predictions are formed from a 16 x 16 pixel area in a previously
- reconstructed frame. Identical to MPEG-1. There can be only one
- filtered 16x16 block prediction source in forward or backward
- predicted macroblocks, and two sources in bi-directional macroblocks.
- The prediction frame itself may have been coded as either a frame or
- two fields, however once a frame is reconstructed, it is simply a
- frame as far as future pictures are concerned.
-
- 24.2 Field predictions in frame-coded pictures:
- Separate predictions are formed for the top (8 lines from odd field)
- and bottom (8 lines from the even field) portions of the macroblock.
- The separate field predictions are specified by two motion vectors,
- each of which may select either the top or bottom field of the
- reference picture (field_select). A total of two motion vectors can be
- specifed in the macroblock header forward or backward predictions, and
- a total four in bi-directional macroblocks. Both the top and bottom
- field blocks share the same macroblock type side information.
-
- 24.3 Field predictions in field-coded pictures:
-
- Predictions are formed from the two most recently decoded fields. Filtered
- block prediction sizes are 16x16, however the 16 lines have a corresponding
- projection onto a 16x32 pixel area of the stored frame. One motion vector
- for each forward or backward prediction, (total of two for bi-directional.)
-
- 24.4 16x8 predictions in field-coded pictures:
-
- Like field macroblocks in frame-coded pictures, the upper and lower 8
- lines in this macroblock mode can have different predictions (hence
- two motion vectors). This mode compensates for the reduced temporal
- prediction precision of field picture macroblocks (a result of the fact
- that fields inherently possess half the number of lines that frames
- do). Thanks to 16x8, the field prediction area projected onto a frame
- is restored to 16 lines. Total of 2 motion vectors for backwards or
- forwards, 4 for bi-directional.
-
- 24.5 Dual Prime prediction in frame and field-coded pictures
-
- Predictions for the current macroblock are formed from the average of
- two 16 x 8 line areas from the two most recently decoded fields. Dual
- Prime was devised as an alternative for B pictures in low delay
- applications, but still offers many of the prediction-estimation turn
- signal quality benefits of B-pictures. Although the Dual Prime mode
- requires one less prediction picture buffer than B pictures, it still
- retains the same instantaneous prediction bandwidth of a B pictures.
-
- As an alternative to coding separate motion vectors for each of the
- upper and lower 16x8 areas, a full motion vector is specified which is
- used to form the address of the prediction block which has the same
- parity as the macroblock field (top or bottom 16x8 region) currently
- being reconstructed. A horizontal and vertical differential vector
- with a range of +1, 0, or -1 (which is not predicted but is variable
- length coded) is added to the scaled vector of the same parity.
-
- vector_of_the_same_parity = transmitted vector
- vector_of_the_opposite_parity = scaled_transmitted_vector +
- differential vector
-
- A Dual Prime (aka DMC) macroblock in a frame-coded picture will have total of
- two full motion vectors (same parity) and two differential vectors (DMV).
- A DMC MB in Field-coded pictures has 1 full motion vector, and 1 DMV.
- Due to the high prediction bandwidth overhead, Main Profile restricts the
- use of Dual Prime prediction to immediate P picture sequences only. High
- Profile permits use of Dual Prime in B pictures. However in Main
- Profile, B pictures may follow or proceed P pictures with Dual Prime,
- as long as there are no B pictures in between the reference picture
- and any P pictures with at least one Dual Prime macroblock.
-
- I DP DP B B P DP DP is legal
- I DP DP B B DP DP DP is illegal.
-
- 24.6 Field and frame organized macroblocks:
-
- Originally intended as a cheaper means of achieving
- field-decorrelation in frame-coded pictures without the fussy overhead
- of separate field prediction estimates, the dct coefficients (coded
- quantized prediction error for a given macroblock) may be organized
- into either a field or frame pattern. Essentially this means that the
- prediction error for the combined 16x16 macroblock may be grouped into
- field or frame blocks. A bit in the macroblock header (dct_type)
- indicates whether the upper and lower portions of the macroblock are
- to be interleaved (frame organized) or remain separated (field
- organized).
-
- 25. How do you tell a MPEG-1 bitstream from a MPEG-2 bitstream?
-
- A. All MPEG-2 bitstreams must contain specific extension headers that
- *immediately* follow MPEG-1 headers. At the highest layer, for
- example, the MPEG-1 style sequence_header() is followed by
- sequence_extension() exclusive to MPEG-2. Some extension headers are
- specific to MPEG-2 profiles. For example, sequence_scalable_extension()
- is not allowed in Main Profile bitstreams.
-
- A simple program need only scan the coded bitstream for byte-aligned start
- codes to determine whether the stream is MPEG-1 or MPEG-2.
-
- 26. What is the reasoning behind MPEG syntax symbols?
-
- A. Here are some of the Whys and Wherefores of MPEG symbols:
-
- Start codes:
- These 32-bit byte-aligned codes provide a mechanism for cheaply
- searching coded bitstreams for commencement of various layers of video
- without having to actually parse variable-length codes or perform any
- decoder arithmetic. Start codes also provide a mechanism for
- resynchronization in the presence of bit errors.
-
- Coded block pattern:
- (CBP --not to be confused with Constrained Parameters!) When the
- frame prediction is particularly good, the displaced frame difference
- (DFD, or temporal macroblock prediction error) tends to be small,
- often with entire block energy being reduced to zero after
- quantization. This usually happens only at low bit rates. Coded
- block patterns prevent the need for transmitting EOB symbols in those
- zero coded blocks. Coded block patterns are transmitted in the
- macroblock header only if the macrobock_type flag indicates so.
-
- constant DC stepsize:
- Each block of a INTRA CODED macroblock has a quantized DC coefficient
- which is differentially coded with the previous DC block value of the
- same component. The quantization stepsize is fixed for the duration
- of the picture, and is indicated in the picture_extension_header() in
- MPEG-2. In MPEG-1, the stepsize is always "8". A fixed size is used
- becuase of the very fact that the value is predicted.
-
-
- DCT_coefficient_first:
- With coded block patterns in NON-INTRA macroblocks signaling all
- possible combinations of all-zero valued blocks, the dct_coef_first
- mechanism assigns a different meaning to the VLC codeword (run = 0,
- level =+/- 1) that would otherwise represent EOB (10) as the first
- coefficient in the zig-zag sequence.
-
- End of Block:
- Saves unnecessary run-length codes. At optimal bitrates, there tends to
- be few AC coefficients concentrated in the early stages of the zig-zag
- vector. In MPEG-1, the 2-bit length of EOB implies that there is an
- average of only 3 or 4 non-zero AC coefficients per block. In MPEG-2
- Intra (I) pictures, with a 4-bit EOB code in Table 1, this estimate is
- between 9 and 16 coefficients. Since EOB is required for all coded
- blocks, its absence can signal that a syntax error has occurred in the
- bitstream.
-
- Macroblock stuffing:
- A genuine pain for VLSI implementations, macroblock stuffing was
- introduced to maintain smoother, constant bitrate control in
- MPEG-1. However, with normalized complexity/activity measures and
- buffer management performed a priori (before coding of the macroblock,
- for example) and local monitoring of coded data buffer levels now
- common operations in encoders, (e.g. MPEG-2 encoder Test Model), the
- need for such localized smoothing evaporated. Stuffing can be achieved
- through slice start code padding if required. A good rule of thumb: if
- you find often yourself wishing for stuffing more than once per slice,
- you probably don't have a very good rate control algorithm. Nonetheless,
- to avoid any temptation, macroblock stuffing is now illegal in MPEG-2
- (All profiles!)
-
- MPEG's modified Huffman VLC tables:
- The VLC tables in MPEG are not Huffman tables in the true sense of
- Huffman coding, but are more like the tables used in Group 3 fax. They
- are entropy constrained, that is, non-downloadable and optimized for a
- limited range of bit rates (sweet spots). With the exception of a few
- codewords, the larger tables were carried over from the H.261 standard
- drafted in the year 1990. MPEG-2 added an "Intra table," also called
- "Table 1". Note that the dct_coefficient tables assume positive/negative
- coefficient pmf symmetry.
-
-
- 27. Why bother to research compressed video when there is a standard?
- A. Despite the fact that a comprehensive worldwide standard now exists for
- digital video, many areas remain wide open for research: advanced
- encoding and pre-processing, motion estimation, macroblock decision
- models, rate control and buffer management in editing environments,
- implementation complexity reduction, etc. Many areas have yet to
- be solved ... (and discovered)..
-
- 28. Where can I get a copy of the latest MPEG-2 draft?
-
- Contact your national standards body (e.g. ANSI Sales in New York City for
- the U.S., British Standards Institute in the UK, etc.). A number of private
- organizations such as Globecom offer ISO documents.
-
- 29. What are the latest working drafts of MPEG-2 ?
-
- To date, (March 1994) MPEG-2 has reached voting document of the
- Draft International Standard for:
-
- Information Technology -- Generic Coding of Moving Pictures and
- Associated Audio. Recommendation H.262, ISO/IEC Draft International Standard
- 13818 Part 2: Video [produced March 25, 1994, not yet approved by national
- standards body voting process].
-
- Systems and Audio are still at Committee Draft -- Systems: Part 1 (13818-1),
- and Audio is Part 3 (13818-1). However, a revised document was produced at
- the Paris meeting (March 24, 1994).
-
- A committee draft for Conformance (Part 4) is expected in Novemeber 1994,
- as well as the Technical Report on Software Simulation (Part 5).
-
- Part 5 is in fact partly comprised of the MPEG Software Simulation Code.
-
- In addition, several new parts to MPEG-2 were approved at the Paris
- meeting:
-
- Part 6: DSMCC (Digital Storage Medium Command and Control)
- Part 7: Audio Extension for Non-Backwards Comptible Coding
-
- 30. What is the latest version of the MPEG-1 documents?
-
- A. Systems (ISO/IEC IS 11172-1), Video (ISO/IEC IS 11172-2), and Audio
- (ISO/IEC IS 11172-3) have reached the final document stage: the
- Internation Standard. Part 4, Conformance Testing, is currently DIS
- (ISO/IEC DIS 11172-4).
-
-
- 31. What is the evolution of ISO standard documents?
-
- A. In chronological order:
-
- ISO/Committee notation Author's notation
- --------------------------------------- -------------------------
- Problem (unofficial first stage) Barroom Witticism
- New work Item (NI) Napkin Item
- New Proposal (NP) Need Permission
- Working Draft (WD) We're Drunk
- Committee Draft (CD) Calendar Deadlock
- Draft International Standard (DIS) Doesn't Include Substance
- International Standard (IS) Induced patent Statements
-
- 32. Where is a good introductory paper to MPEG?
-
- Didier Le Gall, "MPEG: A Video Compression Standard for Multimedia
- Applications," Communications of the ACM, April 1991, Vol.34, No.4, pp. 47-58
-
- 33. What are some journals on related MPEG topics ?
-
- IEEE Transactions on Consumer Electronics
- IEEE Transactions on Broadcasting
- IEEE Transactions on Circuits and Systems for Video Technology
- Advanced Electronic Imaging
- Electronic Engineering Times (EE Times -- more tabloid coverage.
- unfortunately contains attack columns by Richard Doherty)
- IEEE Int'l Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- International Broadcasting Convention (IBC)
- Society of Motion Pictures and Television Engineers (SMPTE) Journal
- SPIE conference on Visual Communications and Image Processing
- SPIE conference on Video Compression for Personal Computers
-
- 34. Is there a book on MPEG video?
-
- A. Yes, there will be a book published sometime in late 1994 by the
- same authors who brought you the JPEG book (Bill Pennebaker, Joan
- Mitchell). Didier Le Gall (MPEG Video subgroup chairman, 1989-1994)
- will be an additional co-author, insuring digressions into such
- areas as arithmetic coding aspects be kept to a minimum :-)
-
- 35. Is it MPEG-2 (Arabic numbers) or MPEG-II (roman)?
-
- Committee insiders most often use the Arabic notation with the
- hyphen, e.g. MPEG-2. Only the most retentive use the official
- designation: Phase 2. In fact, M.P.E.G. itself is a nickname. The
- official title is: ISO/IEC JTC1 SC29 WG11. The militaristic lingo
- has so far managed to keep the enemy (DVI) confused and out of the
- picture.
-
- ISO: International Organization for Standardization
- IEC: International Electrotechnical Commission
- JTC1: Joint Technical Committee 1
- SC29: Sub-committee 29
- WG11: Working Group 11 (moving pictures with... uh, audio)
-
- 36. What happened to MPEG-3?
-
- MPEG-3 was to have targeted HDTV applications with sampling dimensions
- up to 1920 x 1080 x 30 Hz and coded bitrates between 20 and 40
- Mbit/sec. It was later discovered that with some (syntax compatible)
- fine tuning, MPEG-2 and MPEG-1 syntax worked very well for HDTV rate
- video. The key is to maintain an optimal balance between sample rate
- and coded bit rate.
-
- Also, the standardization window for HDTV was rapidly closing. Europe
- and the United States were on the brink of committing to
- analog-digital subnyquist hybrid algorithms (D-MAC, MUSE, et al). By
- 1992, European all-digital projects such as HD-DIVINE and VADIS
- demonstrated better picture quality with respect to bandwidth using
- the MPEG syntax. In the United States, the Sarnoff/NBC/Philips/Thomson
- HDTV consortium had used MPEG-1 syntax from the beginning of its
- all-digital proposal, and with the exception of motion artifacts
- (due to limited search range in the encoder), was deemed to have the best
- picture quality of all three digital proponents in the early 1993 bake-off.
- HDTV is now part of the MPEG-2 High-1440 Level and High Level toolkit.
-
- 37. What is MPEG-4?
- MPEG-4 targets the Very Low Bitrate applications defined loosely as
- having picture sampling dimensions up to 176 x 144 (QSIF) and coded
- bit rates between 4800 and 64,000 bits/sec. This new standard would
- be used, for example, in low bit rate videotelephony/conferencing over
- analog telephone lines (Plain Old Telephone System or POTS).
-
- This effort is in the very early stages. Morphology, fractals, model
- based... even anal-retentive block transform coding are all in the
- offering. MPEG-4 is now in the application identification and requirements
- phase. The proposals process begins in November 1994.
-
-
- 38. What are the scaleable modes of MPEG-2?
-
- Scaleable video is permitted only in the Spatial and SNR profiles.
- Data partitioning and temporal scalability are not yet officially
- part of any defined Profile, and are therefore deemed experimental.
-
- Currently, there are four scaleable modes in the MPEG-2 toolkit. These
- modes break MPEG-2 video into different layers (base, middle, and high
- layers) mostly for purposes of prioritizing video data. For example,
- the high priority channel (bitstream) can be coded with a combination
- of extra error correction information and/or increased signal strength
- (i.e. higher Carrier- to-Noise ratio or lower Bit Error Rate) than the
- lower priority channel. For example, in HDTV, the high priority
- bitstream (720 x 480) can be decoded under noise conditions were the
- lower priority (1440 x 960) cannot. This is part of the "graceful
- degradation concept. Breaking a video signal into two streams (base
- and enhancements) usually carries a coding-efficiency penalty,
- however. Usually less than 1.5 dB.
-
- Another purpose of salability is Complexity Division. For example, a
- standard TV display need only have a 720 x 480 picture decoded, whereas
- a more expensive processor could decode 1440 x 960 video for HDTV
- displays. This application can be addressed through simulcasting of
- a Main Profile at Main Level bitstream serving as a base layer, and
- a separate spatial scalable enhancement stream at 1440 x 960 predicted
- from the upsampled base layer.
-
- Here is a brief summary of the MPEG-2 video scalability modes:
-
- Spatial Scalablity-- Useful in simulcasting, and for feasible software
- decoding of the lower resolution, base layer. This spatial domain
- method codes a base layer at lower sampling dimensions (i.e.
- "resolution") than the upper layers. The upsampled reconstructed lower
- (base) layers are then used as prediction for the higher layers.
-
- Data Partitioning-- Similar to JPEG's frequency progressive mode, only
- the slice layer indicates the maximum number of block transform
- coefficients contained in the particular bitstream (known as the
- "priority break point"). Data partitioning is a frequency domain method
- that breaks the block of 64 quantized transform coefficients into two
- bitstreams. The first, higher priority bitstream contains the more
- critical lower frequency coefficients and side informations (such as DC
- values, motion vectors). The second, lower priority bitstream carries
- higher frequency AC data.
-
- SNR scalability-- SNR scalability is a frequency-domain method where
- channels are coded at identical sample rates, but with differing
- picture quality (achieved through macroblock quantization step sizes).
- The DCT coefficients of the higher priorty base layer can be enhanced
- with DCT coefficients of lower priority enhancement layer.
-
- Temporal Scalability--- A temporal domain method useful in, e.g.,
- stereoscopic video. The first, higher priority bitstreams codes video
- at a lower frame rate, and the intermediate frames can be coded in a
- second bitstream using the first bitstream reconstruction as prediction.
- In stereoscopic vision, for example, the left video channel can be
- prediction from the right channel.
-
- Other scalability modes were experimented with in MPEG-2 video (such as
- Frequency Scalability), but were eventually dropped in favor of methods
- that demonstrated comparable or better picture quality with greater
- simplicity.
-
-
- 39. Why MPEG-2? Wasn't MPEG-1 enough?
-
- A. MPEG-1 was optimized for CD-ROM or applications at about 1.5
- Mbit/sec. Video was strictly non-interlaced (i.e. progressive). The
- international cooperation executed well enough for MPEG-1, that the committee
- began to address applications at broadcast TV sample rates using the
- CCIR 601 recommendation (720 samples/line by 480 lines per frame by 30
- frames per second or about 15.2 million samples/sec including chroma) as
- the reference.
-
- Unfortunately, today's TV scanning pattern is interlaced. This
- introduces a duality in block coding: do local redundancy areas (blocks)
- exist exclusively in a field or a frame.(or a particle or wave) ? The
- answer of course is that some blocks are one or the other at different
- times, depending on motion activity. The additional man years of
- experimentation and implementation between MPEG-1 and MPEG-2 improved
- the method of block-based transform coding.
-
-
- 40. What did MPEG-2 add to MPEG-1 in terms of syntax/algorithms ?
- A. Here is a brief summary:
-
- Sequence layer:
- More aspect ratios. A minor, yet necessary part of the syntax.
-
- Horizontal and vertical dimensions are now required to be a multiple of
- 16 in frame coded pictures, and the vertical dimension must be a
- multiple of 32 in field coded pictures.
-
- 4:2:2 and 4:4:4 macroblocks were added in the Next profiles.
-
- Syntax can now signal frame sizes as large as 16383 x 16383.
-
- Syntax signals source video type (NTSC, PAL, SECAM, MAC, component) to
- help post-processing and display.
-
- Source video color primaries (609, 170M, 240M, D65, etc.) and opto-
- electronic transfer characteristics (709, 624-4M, 170M etc.) can be
- indicated.
-
- Four scaleable modes [see scalability discussion]
-
- Picture layer:
- All MPEG-2 motion vectors are specified to a half-pel sample grid.
-
- DC precision can be user-selected as 8, 9, 10, or 11 bits.
-
- New scalar quantization matrices may be downloaded once per picture. In High
- profile, separate chrominance matrices now exist (Y and C no longer have to
- share)
-
- Concealment motion vectors were added to I-pictures in order to increase
- robustness from bit errors. I pictures are the most critical and sensitive
- picture in a group of pictures.
-
- A non-linear macroblock quantization factor providing a wider dynamic
- range, from 0.5 to 56, than the linear MPEG-1 (1 to 32) range. Both are
- sent as a 5-bit FLC side information in the macroblock and slice
- headers.
-
- New Intra-VLC table for dct_coefficient_next (AC run-level events) that
- is a better match for the histogram of Intra-coded pictures. EOB is 4
- bits. The old table, dct_coef_next, are reserved for use in non-intra
- pictures (P, B), although they new table can be used for Intra-coded
- macroblocks in P and B pictures as well.
-
- Alternate scanning pattern that (supposedly) improves entropy coding
- performance over the original Zig-Zag scan used in H.261, JPEG, and MPEG-1.
- The extra scanning pattern is geared towards interlaced video.
-
- Syntax to signal an irregular 3:2 pulldown process (repeat_field_first flag)
-
- Progressive and interlaced frame coding
-
- Syntax to indicate source composite video characteristics useful in post-
- processing operations. (v-axis, field sequence, sub_carrier, phase,
- burst_amplitude, etc.)
-
- Pan & scanning syntax that tells decoder how to, for example, window a
- 4:3 image within a wider 16:9 aspect ratio coded image. Vertical pan
- offset has 1/16th pixel accuracy.
-
- Macroblock layer:
- Macroblock stuffing is now illegal in MPEG-2 (hurray!!). If stuffing is
- really needed, the encoder can pad slice start codes.
-
- Two organizations for macroblock coefficients (interlaced and progressive)
- signaled by dct_type flag.
-
- Now only one run-level escape code code (24-bits) instead of the single (20-
- bits) and double escape (28-bits) in MPEG-1.
-
- Improved mismatch control in quantization over the original oddification
- method in MPEG-1. Now specifies adding or subtracting one to the 63rd
- AC coefficient depending on parity of the summed coefficients. MPEG-2
- mismatch control is performed on the transform coefficients, whereas in MPEG-
- 1, it is applied to the quantized transform coefficients.
-
- Many additional prediction modes (16x8 MC, field MC, Dual Prime) and,
- correspondingly, macroblock modes.
-
- Overall, MPEG-2's greatest compression improvements over MPEG-1 are:
- prediction modes, Intra VLC table, DC precision, non-linear macroblock
- quantization. Implementation improvements: macroblock stuffing was
- eliminated.
-
- 41. How do MPEG and JPEG differ?
-
- A. The most fundamental difference is MPEG's use of block-based motion
- compensated prediction (MCP)---a method falling into the general category of
- temporal DPCM.
-
- The second most fundamental difference is in the target application.
- JPEG adopts a general purpose philosophy: independence from color space
- (up to 255 components per frame) and quantization tables for each
- component. Extended modes in JPEG include two sample precision (8 and
- 12 bit sample accuracy), combinations of frequency progressive, spatial
- hierarchically progressive, and amplitude (point transform) progressive
- scanning modes. Further color independence is made possible thanks to
- downloadable Huffman tables (up to one for each component.)
-
- Since MPEG is targeted for a set of specific applications, there is only
- one color space (4:2:0 YCbCr), one sample precision (8 bits), and one
- scanning mode (sequential). Luminance and chrominance share quantization
- and VLC tables. MPEG adds adaptive quantization at the macroblock (16 x
- 16 pixel area) layer. This permits both smoother bit rate control and
- more perceptually uniform quantization throughout the picture and image
- sequence. However, adaptive quantization is part of the Enhanced JPEG
- charter (ISO/IEC 10918-3) currently in verification stage. MPEG variable
- length coding tables are non-downloadable, and are therefore optimized
- for a limited range of compression ratios appropriate for the target
- applications.
-
- The local spatial decorrelation methods in MPEG and JPEG are very
- similar. Picture data is block transform coded with the two-dimensional
- orthanormal 8x8 DCT, with asymmetric basis vectors about time (aka DCT-
- II). The resulting 63 AC transform coefficients are mapped in a zig-zag
- pattern (or alternative scan pattern in MPEG-2) to statistically
- increase the runs of zeros. Coefficients of the vector are then
- uniformly scalar quantized, run-length coded, and finally the run-length
- symbols are variable length coded using a canonical (JPEG) or modified
- Huffman (MPEG) scheme. Global frame redundancy is reduced by 1-D DPCM
- of the block DC coefficients, followed by quantization and variable
- length entropy coding of the quantized DC coefficient.
-
- MCP DCT ZZ
- Q
- Frame -> 8x8 spatial block -> 8x8 frequency block -> Zig-zag scan ->
-
- RLC VLC
- quanitzation -> run-length coding -> variable length coding.
-
- The similarities have made it possible for the development of hard-wired
- silicon that can code both standards. Even some highly microcoded
- architectures employing hardwired instruction primitives or functional
- blocks benefit from JPEG/MPEG similarities. There are many additional
- yet minor differences. They include:
-
- 1. In addition to the 8-bit mode, DCT and quantization precision
- in MPEG has a 9-bit and 12-bit mode, respectively, exclusively in non-
- intra coded macroblocks. A 1-bit expansion takes place in the
- macroblock difference operation.
-
- 2. Mismatch control in MPEG-1 forces quantized coefficients to
- become odd values (oddification). JPEG does not employ any mismatch
- mechanism.
-
- 3. JPEG run-length coding produces run-size tokens (run of zeros,
- non-zero coefficient magnitude) whereas MPEG produces fully concatenated
- run-level tokens that do not require magnitude differential bits.
-
- 4. DC values in MPEG-1 are limited to 8-bit precision (a constant
- stepsize of 8), whereas JPEG DC precision can occupy all possible 11-
- bits. MPEG-2, however, re-introduced extra DC precision critical even
- at high compression ratios.
-
-
- Difference between MPEG and H.261
-
- 42. How do MPEG and H.261 differ?
-
- A. H.261, also known as Px64, was targeted for teleconferencing
- applications where motion is naturally more limited. Motion vectors are
- restricted to a range of +/- 15 pixel unit displacements. Prediction
- accuracy is reduced since H.261 motion vectors are specified to only
- integer-pel accuracy. Other quality syntactic differences include: no
- B-pictures, inferior mismatch control.
-
- 43. Is H.261 the de facto teleconferencing standard?
-
- A. Not exactly. To date, about seventy percent of the industrial
- teleconferencing hardware market is controlled by PictureTel of Mass.
- The second largest market controller is Compression Labs of Silicon
- Valley. PictureTel hardware includes compatibility with H.261 as a
- lowest common denominator, but when in communication with other
- PictureTel hardware, it can switch to a mode superior at low bit rates
- (less than 300kbits/sec). In fact, over 2/3 of all teleconferencing is
- done at two-times switched 56 channel (~P = 2) bandwidth. ISDN is still
- expensive. In each direction, video and audio are coded at an aggregate
- rate of 112 kbits/sec (2*56 kbits/sec). The PictureTel proprietary
- compression algorithm is acknowledged to be a combination of spatial
- pyramid, lattice vector quantizer, and an unidentified entropy coding
- method. Motion compensation is considerably more refined and
- sophisticated than the 16x16 integer-pel block method specified in
- H.261.
-
- The Compression Labs proprietary algorithm also offers significant
- improvement over H.261 when linked to other CLI hardware. Local
- decorrelation is based on a DCT-VQ hybrid.
-
- Currently, ITU-TS (International Telecommunications Union--
- teleconferencing Sector), formerly CCITT, is quietly defining an
- improvement to H.261 with the participation of industry vendors.
-
-
- Rate control
-
- 44. What is the TM rate control and adaptive quantization technique ?
-
- A. The Test model (MPEG-2) and Simulation Model (MPEG-1) were not, by
- any stretch of the imagination, meant to epitomize state-of-the art
- encoding quality. They were, however, designed to exercise the syntax,
- verify proposals, and test the *relative* compression performance of
- proposals in a timely manner that could be duplicated by co-
- experimenters. Without simplicity, there would have been no doubt
- endless debates over model interpretation. Regardless of all else, more
- advanced techniques would probably trespass into proprietary territory.
-
- The final test model for MPEG-2 is TM version 5b, aka TM version 6. The
- final MPEG-1 simulation model is version 3. The MPEG-2 TM rate control
- method offers a dramatic improvement over the SM method. TM adds more
- accurate estimation of macroblock complexity through use of limited a
- priori information. Macroblock quantization adjustments are computed on
- a macroblock basis, instead of once-per-slice.
-
- 45. How does the TM work?
- A. Rate control and adaptive quantization are divided into three steps:
-
- Step One:Bit Allocation
-
- In Complexity Estimation, the global complexity measures assign
- relative weights to each picture type (I,P,B). These weights (Xi, Xp,
- Xb) are reflected by the typical coded frame size of I, P, and B
- pictures (see typical frame size discussion). I pictures are usually
- assigned the largest weight since they have the greatest stability
- factor in an image sequence. B pictures are assigned the smallest
- weight since B energy do not propagate into other pictures and are usually
- highly correlated with neighboring P and I pictures.
-
- The bit target for a frame is based on the frame type, the remaining number
- of bits left in the Group of Pictures (GOP) allocation, and the immediate
- statistical history of previously coded pictures.
-
- Step Two: Rate Control
-
- Rate control attempts to adjust bit allocation if there is significant
- difference between the target bits (anticipated bits) and actual coded
- bits for a block of data. If the virtual buffer begins to overflow, the
- macroblock quantization step size is increased, resulting in a smaller
- yield of coded bits in subsequent macroblocks. Likewise, if underflow
- begins, the step size is decreased. The Test Model approximates that the
- target
- picture has spatially uniform distribution of bits. This is a safe
- approximation since spatial activity and perceived quantization noise
- are almost inversely proportional. Of course, the user is free to
- design a custom distribution, perhaps targeting more bits in areas that
- contain text, for example.
-
-
- Step Three: Adaptive Quantization
-
- The final step modulates the macroblock quantization step size obtained in
- Step 2 by a local activity measure. The activity measure itself is normalized
- against the most recently coded picture of the same type (I, P, or B). The
- activity for a macroblock is chosen as the minimum among the four 8x8 block
- luminance variances. Choosing the minimum block is part of the concept that
- a macroblock is no better than the block of highest visible distortion
- (weakest link in the chain).
-
- 46. What is a good motion estimation method, then?
-
- A. When shopping for motion vectors, the three basic characteristics
- are: Search range, search pattern, and matching criteria. Search
- pattern has the greatest impact on finding the best vector. Hierarchical
- search patterns first find the best match between downsampled images of
- the reference and target pictures and then refine the vector through
- progressively higher resolutions. When compared to other fast methods,
- hierarchical patterns are less likely to be confused by extremely local
- distortion minimums as being a best match. Also note that subsampled search
- and hierarchical search are not synonymous.
-
- Q. Is there a limit to the length of motion vectors?
-
- The search area is unlimited, but the reconstructed motion vectors must
- not:
-
- a. point beyond the picture boundaries (1 <= MV_x <= luminancewidth -
- 16) and (1 <= MV_y <= luminanceheight - 16). The - 16 is due to the
- fact that the motion vector origin is the upper left hand corner of a
- macroblock)
-
- b. In Constrained Parameters MPEG-1, the motion vector is limited to a
- range of [-64,+63.5] luminance samples with half-pel accuracy, and [-
- 128,+127.5] with integer pel accuracy. Break the constrained parameters
- rules and your video sequence will not likely display on many hardware
- devices.
-
- c. In MPEG-2 Video Main Profile at Main Level, the motion vectors are
- always on a half-pel co-ordinate grid, and the vertical range is
- restricted to [-64, +63.5], and the horizontal limit is [-256,+255.5].
-
- d. in MPEG-1, the syntactic limit of the motion vector is [-1024,+1023]
- integer pel, horizontal and vertical.
-
- e. in MPEG-2, the syntactic limit of the motion vector is [-2048,+2047.5]
- horizontal, [-1024,+1023.5] vertical.
-
-
- 47. Is exhaustive search "optimal" ?
-
- A. Definitely not in the context of block-based MCP video. Since one
- motion vector represents the prediction of 256 pixels, divergent pixels
- within the macroblock are misrepresented by the "global" vector. This
- leads back to the general philosophy of block-based coding as an
- approximation technique. In their ICASSP93 paper, Sullivan discusses ways in
- which block-based prediction schemes can solve part of this problem.
-
- Exhaustive search may find blocks with the least distortion (displaced frame
- difference) but will not produce motion vectors with the lowest entropy.
-
- 48. What are some advanced encoding methods?
-
- Quantizer feedback: determine the dependent quantization stepsize by
- modeling quantization error propagating over multiple pictures. [Uz/et
- al ICASSP 93, Ortega/Vetterli/et al ICASSP 93]
-
- Smoothness constraint placed on local activity measures. immediate blocks
- outside target macroblock are considered when selecting macroblock
- quantization stepsize .[Thomson/Savitier patent]
-
- Horizontal variance: measure variance between columns of pixels in addition
- to the traditional measure of variance along rows (lines) when making
- field/frame macroblock prediction decision.
-
- DFD energy: examine DFD energy/variance when making Intra/Non-intra
- macroblock decision.
-
- Activity measures: use total bits from a first-pass encoding of a picture or
- macroblock as a measure of the activity. Coded bits is a more accurate
- reflection of local complexity than variance. [Thomson/Savitier patent]
-
- motion vector cost: this is true for any syntax elements, really. Signaling
- a macroblock quantization factor or a large motion vector differential can
- cost more than making up the difference with extra quantized DFD (prediction
- error) bits. The optimum can be found with, some Lagrangian operator. In
- summary, any compression system with side information, there is a optimum
- point between signaling overhead (e.g. prediction) and prediction error.
-
- Liberal Interpretations of the Forward DCT:
- Borrowing from the concept that the DCT is simply a filter bank, a
- technique that seems to be gaining popularity is basis vector shaping.
- Usually this is combined with the quantization stage since the two are
- tied closely together in a rate-distortion sense. The idea is to use
- the basis vector shaping as a cheap alternative to pre-filtering by
- combining the more desirable data adaptive properties of pre-filtering/
- pre-processing into the transformation process... yet still reconstruct
- a picture in the decoder using the standard IDCT that looks reasonably
- like the source. Some more clever schemes will apply a form of windowing.
- [Warning: watch out for eigenimage/basis vector orthoganality. ]
-
- Frequency-domain enhancements:
- Enhancements are applied after the DCT (and possibly quantization)stage
- to the transform coefficients. This borrows from the concept: if you
- don't like the (quantized) transformed results, simply reshape them into
- something you do like. Suppressing isolated small amplitudes is popular.
-
- Temporal spreading of quantization error:
- This method is similar to the original intent behind color subcarrier
- phase alternation by field in the NTSC, PAL, and SECAM analog TV
- standards: for stationary areas, noise does not hang" in one location,
- but dances about the image over time to give a more uniform effect.
- Distribution makes it more difficult for the eye to "catch on" to
- trouble spots (due to the latent temporal response curve of human
- vision). Simple encoder models tend to do this naturally but will not
- solve all situations.
-
-
- Look-ahead and adaptive frame cycle structures: analyze picture activity
- several pictures into the future, looking for scene changes or motion
- statistics.
-
- It is easy to spot encoders that do not employ any advanced encoding
- techniques: reconstructed video usually contains ringing around edges,
- color bleeding, and lots of noise.
-
- 49. Is so-and-so really MPEG compliant ?
-
- A. At the very least, there are two areas of conformance/compliance in
- MPEG: 1. Compliant bitstreams 2. compliant decoders. Technically
- speaking, video bitstreams consisting entirely of I-frames (such as
- those generated by Xing software) are syntactically compliant with the
- MPEG specification. The I-frame sequence is simply a subset of the full
- syntax. Compliant bitstreams must obey the range limits (e.g. motion
- vectors limited to +/-128, frame sizes, frame rates, etc.)and syntax
- rules (e.g. all slices must commence and terminate with a non-skipped
- macroblock, no gaps between slices, etc.).
-
- Decoders, however, cannot escape true conformance. For example, a
- decoder that cannot decode P or B frames are *not* legal MPEG.
- Likewise, full arithmetic precision must be obeyed before any decoder
- can be called "MPEG compliant." The IDCT, inverse quantizer, and
- motion compensated predictor must meet the specification requirements...
- which are fairly rigid (e.g. no more than 1 least significant bit of
- error between reference and test decoders). Real-time conformance is
- more complicated to measure than arithmetic precision, but it is
- reasonable to expect that decoders that skip frames on reasonable
- bitstreams are not likely to be considered compliant.
-
- Artifacts
-
- 50. What are the tell-tale MPEG artifacts?
-
- A. If the encoder did its job properly, and the user specified a proper
- balance between sample rate and bitrate, there shouldn't be any visible
- artifacts. However, in sub-optimal systems, you can look for:
-
- Gibbs phenomenon/Ringing/Aliasing (too few AC bits, not enough
- pre-processing)
-
- Blockiness (not considering your neighbors before quantizing)
-
- Posterization (too few DC bits)
-
- Checkerboards (DCT eigenimages as a result of too few AC coefficients)
- Colorbleeding (not considering color in encoder cost model, not
- subtracting color at edges of objects, etc.)
-
- 51. Where are the weak points of MPEG video ?
- A.
- Texture patterns (rapidly alternating lines)
- sharp edges (especially text)
- [installment 3]
-
-
- 52. What are some myths about MPEG?
- A. There are a few major myths that I am aware of:
-
- 1. Block displacements: macroblock predictions are formed out of
- arbitrary 16x16 (or 16x8/16x16 in MPEG-2) areas from previously
- reconstructed pictures. Many people believe that the prediction
- macroblocks have boundaries that fall on interchange boundaries (pixel
- 0, 15, 31, 53... line 0, 15, 31, 53... etc.). In fact, motion vectors
- represent relative translations with respect to the target
- reconstruction macroblock coordinates. The motion vectors can point to
- half pixel coordinates, requiring that the prediction macroblock to be
- formed via bi-linear interpolation of pixels.
-
-
- 2. Displaced frame (macroblock) difference construction: the prediction
- error formed as the difference between the prediction macroblock and
- source macroblock is coded much like an Intra macroblock. The
- prediction may come from different locations (as in bi-directional
- prediction--or in MPEG-2--16x8, field-in-frame, and Dual Prime), but the
- DFD is always coded as a 16x16 unit.
-
- 3. Compression ratios
-
- You hear 200:1 and 100:1 in the media. Utter rubbish. The true range
- is between 16:1 and 40:1. Spreading misinformation about compression
- ratios in public will catch the attention of the infamous "MPEG Police."
- They say mild-mannered Michael Barnsley will snap, without warning, into
- violent rage if he doesn't get the upper bunk bed.
-
- 4. Picture coding types all consist of the same macroblocks
-
- Macroblocks within I pictures are strictly intra-coded. Macroblocks
- within P pictures can be either predicted or intra-coded, and B pictures
- they can be bi-directional, forward, backward, or intra. Additional
- macroblock modes switches include: predicted with no motion
- compensation, modified macroblock quantization, coding of prediction error or
- not. The switches are concatenated into the macroblock_type side information
- and variable length coded in the macroblock header.
-
- 53. What is the color space of MPEG?
-
- MPEG strictly specifies the YCbCr color space, not YUV or YIQ or YPbPr
- or YDrDb or any other color difference variations. Regardless of any
- bitstream parameters, MPEG-1 and MPEG-2 Video Main Profile specify
- 4:2:0 chroma ratio, where the color difference channels (Cb, Cr) have
- half the "resolution" or sample grid density in both the horizontal and
- vertical direction with respect to luminance.
-
- MPEG-2 High Profile includes an option for 4:2:2 and 4:4:4 coding.
- Applications for this are likely to be broadcasting and contribution
- equipment.
-
- 54. Don't you mean 4:1:1 ?
-
- A. No, no, no. Here is a table of ratios:
-
-
- CCIR 601 (60 Hz) image Chroma sub-sampling factors
- format Y Cb, Cr Vertical Horizontal
- ----- --------- ---------- -------- ----------
- 4:4:4 720 x 480 720 x 480 none none
- 4:2:2 720 x 480 360 x 480 none 2:1
- 4:2:0 720 x 480 360 x 240 2:1 2:1
- 4:1:1 720 x 480 720 x 120 none 4:1
- 4:1:0 720 x 480 180 x 120 4:1 4:1
-
- 3:2:2, 3:1:1, and 3:1:0 are less common variations.
-
- 55. Why did MPEG choose 4:2:0 ? Isnt 4:2:2 the standard for TV?
-
- A. At least three reasons I can think of:
-
- 1. 4:2:0 picture memory requirements are 33% less than the size of
- 4:2:2 pictures. MPEG-1 decoder are able to snugly fit all 3 SIF
- pictures (1 reconstruction & display, 2 prediction) into 512 KBytes of
- buffer space. CCIR 601 is a tighter fit into 2 Mbytes.
-
- 2. The subjective difference between 4:2:0 and 4:2:2 is minimal, when
- considering consumer display equipment and distribution compression ratios.
-
- 3. Vertical decimation increases compression efficiency by reducing syntax
- overhead posed in an 8 block (4:2:0) macroblock structure.
-
- 4. You're compressing the hell out of the video signal, so what possible
- difference can the 0:0:2 high-pass make?
-
- Interlacing and the 62 microsecond gap between successively scanned lines
- introduces some discontinuities, but most of this can be alleviated through
- pre-processing.
-
- 56. What is the precision of MPEG samples?
-
- A. By definition, MPEG samples have no more and no less than 8-bits uniform
- sample precision (256 quantization levels). For luminance (which is
- unsigned) data, black corresponds to level 0, white is level 255. However, in
- CCIR recommendation 601 chromaticy, levels 0 through 14 and 236 through 255
- are reserved for blanking signal excursions. MPEG currently has no such
- clipped excursion restrictions, although decoder might take care to insure
- active samples do not exceed these limits. With three color components per
- pixel, the total combination is roughly 16.8 million colors (i.e. 24-bits).
-
- 57. What is all the fuss with cositing of chroma components?
-
- A. It is moderately important to properly co-site chroma samples,
- otherwise a sort of chroma shifting effect (exhibited as a halo) may result
- when the reconstructed video is displayed. In MPEG-1 video, the chroma
- samples are exactly centered between the 4 luminance samples (Fig 1.) To
- maintain compatibility with the CCIR 601 horizontal chroma locations and
- simplify implementation (eliminate need for phase shift), MPEG-2 chroma
- samples are arranged as per Fig.2.
-
- Y Y Y Y Y Y Y Y YC Y YC Y
- C C C C
- Y Y X Y Y Y Y Y YC Y YC Y
-
- Y Y Y Y Y Y Y Y YC Y YC Y
- C C C C
- Y Y Y Y Y Y Y Y YC Y YC Y
-
- Fig.1 MPEG-1 Fig.2 MPEG-2 Fig.3 MPEG-2 and
- 4:2:0 organization 4:2:0 organization CCIR Rec. 601
- 4:2:2 organization
-
- MPEG for the data compression expert
-
- 58. How would you explain MPEG to the data compression expert?
-
- A. MPEG video is a block-based video scheme.
-
-
- 59. How does MPEG video really compare to TV, VHS, laserdisc ?
- A. VHS picture quality can be achieved for source film video at about 1
- million bits per second (with proprietary encoding methods). It is
- very difficult to objectively compare MPEG to VHS. The response curve
- of VHS places -3 dB at around 2 MHz of analog luminance bandwidth
- (equivalent to 200 samples/line). VHS chroma is considerably less dense
- in the horizontal direction than MPEG source video (compare 80
- samples/line to 176!). From a sampling density perspective, VHS is
- superior only in the vertical direction (480 luminance lines compared
- to 240)... but when taking into account (supposedly such things as)
- interfield magnetic tape crosstalk and the TV monitor Kell factor, the
- perceptual vertical advantage is not all that significant. VHS is
- prone to such inconveniences as timing errors (an annoyance addressed
- by time base correctors), whereas digital video is fully discretized.
- Pre-recorded VHS is typically recorded at very high duplication speeds
- (5 to 15 times real time playback speed), opening up additional avenues
- for artifacts. In gist, MPEG-1 at its nominal parameters can match
- VHSs sexy low-pass-filtered look.
-
- With careful coding schemes, broadcast NTSC quality can be approximated at
- about 3 Mbit/sec, and PAL quality at about 4 Mbit/sec. Of course, sports
- sequences with complex spatial-temporal activity should be treated with bit
- rates more like 5 and 6 Mbit/sec, respectively. Laserdisc is a tough one to
- compare. Laserdiscs are encoded with composite video (NTSC or PAL).
- Manufacturers of laser disc players make claims of up to 425 TVL (or 567
- samples/line) response. Thus it could be said the laserdisc has a 567 x 480 x
- 30 Hz "potential resolution". The carrier-to-noise ratio is typically better
- than 48 dB. Timing is excellent. Yet some of the clean characteristics of
- laserdisc can be achieved with MPEG-1 at 1.15 Mbit/sec (SIF rates),
- especially for those areas of medium detail (low spatial activity) in the
- presence of uniform motion. This may be why some people say MPEG-1 video at
- 1.15 Mbit/sec looks almost as good as Laserdisc or Super VHS at times.
-
- 60. What are the typical MPEG-2 bitrates and picture quality?
-
- Picture type
- I P B Average
- MPEG-1 SIF
- @ 1.15 Mbit/sec 150,000 50,000 20,000
- 38,000
-
- MPEG-2 601 400,000 200,000 80,000
- 130,000
- @ 4.00 Mbit/sec
-
- Note: parameters assume Test Model for encoding, I frame distance of 15 (N =
- 15), and a P frame distance of 3 (M = 3).
-
- Of course, among differing source material, scene changes, and use of
- advanced encoder models... these numbers can be significantly different.
-
- 61. At what bitrates is MPEG-2 video optimal?
- A. The Test subgroup has defined a few examples:
-
- "Sweet spot" sampling dimensions and bit rates for MPEG-2:
-
- Dimensions Coded rate Comments
- ------------- ---------- ----------------------------------------
- ---
- 352x480x24 Hz 2 Mbit/sec Half horizontal 601. Looks almost NTSC
- (progressive) broadcast quality, and is a good
- (better)
- substitute for VHS. Intended for film src.
-
- 544x480x30 Hz 4 Mbit/sec PAL broadcast quality (nearly full
- capture
- (interlaced) of 5.4 MHz luminance carrier). Also
- 4:3 image dimensions windowed within 720
- sample/line 16:9 aspect ratio via pan&scan.
-
- 704x480x30 Hz 6 Mbit/sec Full CCIR 601 sampling dimensions.
- (interlaced)
-
- [these numbers subject to change at whim of MPEG Test subgroup]
-
-
-
- 62. Why does film perform so well with MPEG ?
- A. Several reasons, really:
-
- 1) The frame rate is 24 Hz (instead of 30 Hz) which is a savings of
- some 20%.
- 2) the film source video is inherently progressive. Hence no fussy
- interlaced spectral frequencies.
- 3) the pre-digital source was severely oversampled (compare 352 x 240
- SIF to 35 millimeter film at, say, 3000 x 2000 samples). This can
- result in a very high quality signal, whereas most video cameras
- do
- not oversample, especially in the vertical direction.
- 4) Finally, the spatial and temporal modulation transfer function
- (MTF)
- characteristics (motion blur, etc) of film are more amenable to
- the transform and quantization methods of MPEG.
-
- 63. What is the best compression ratio for MPEG ?
-
- A. The MPEG sweet spot is about 1.2 bits/pel Intra and .35 bits/pel
- inter. Experimentation has shown that intra frame coding with the
- familiar DCT-Quantization-Huffman hybrid algorithm achieves optimal
- performance at about an average of 1.2 bits/sample or about 6:1
- compression ratio. Below this point, artifacts become noticeable.
-
- 64. Can MPEG be used to code still frames?
-
- A. Yes. There are, of course, advantages and disadvantages to using
- MPEG over JPEG:
-
- Disadvantages:
-
- 1. MPEG has only one color space
- 2. MPEG-1 and MPEG-2 Main Profile luma and chroma share quanitzation
- and VLC tables
- 3. MPEG-1 is syntactically limited to 4k x 4k images, and 16k x 16k for
- MPEG-2.
-
- Advantages:
-
- 1. MPEG possesses adaptive quantization
-
- 2. With its limited still image syntax, MPEG averts any temptation to use
- unnecessary, expensive, and academic encoding methods that have little
- impact on the overall picture quality (you know who you are).
-
- Philips' CD-I spec. has a requirement for a MPEG still frame mode, with
- double SIF image resolution. This is technically feasible mostly thanks to
- the fact that only one picture buffer is needed to decode a still image
- instead of three buffers.
-
- 65. Is there an MPEG file format?
-
- A. Not exactly. The necessary signal elements that indicate image size,
- picture rate, aspect ratio, etc. are already contained within the sequence
- layer of the MPEG video stream. The Whitebook format for Karoke and CD-I
- movies specify a range of (time-division) multiplexing strategies for audio
- and video bitstreams. A directory format listing scenes and their locations
- on the disc is associated with the White Book specification.
-
- 66. What are some pre-processing enhancements ?
-
- Adaptive de-interlacing:
-
- This method maps interlaced video from a higher sampling rate (e.g 720 x 480)
- into a lower rate, progressive format (352 x 240). The most basic algorithm
- measures the correlation between two immediate macroblock fields, and if the
- correlation is high enough, uses an average of both fields to form a frame
- macroblock. Otherwise, a field area from one field (usually of the same
- parity) is selected. More clever algorithms are much more complex than this,
- and may involve median filtering, and multirate/multidimensional tools.
-
- Pre-anti-aliasing and Pre-blockiness reduction:
- A common method in still image coding is to pre-smooth the image before
- encoding. For example, if pre-analysis of a frame indicates that serious
- artifacts will arise if the picture were to be coded in the current condition
- (i.e. below the sweet spot), a pre-anti-aliasing filter can be applied. This
- can be as simple as having a smoothing severity proportional to the image
- activity. The pre-filter can be global (same smoothing factor for whole
- image or sequence) or locally adaptive. More complex methods will again use
- multirate/multidimensional methods.
-
- One straightforward concept from multidimensional/multirate e-processing is
- to apply source video whose resolution (sampling density) is greater than
- the target source and reconstruction sample rates. This follows the basic
- principles of oversampling, as found in A/D converters.
-
- These filters emphasize the fact that most information content is contained
- in the lower harmonics of a picture anyway. VHS is hardly considered to be a
- sharp cut-off medium, tragically implying that "320 x 480 potential" of
- VHS is never truly realized.
-
- 67. Why use these "advanced" pre-filtering techniques?
-
- A. Think of the DCT and quantizer as an A/D converter. Think of the DCT/Q
- pre-filter as the required anti-alias prefilter found before every A/D. The
- big difference of course is that the DCT quantizer assigns a varying number
- of bits per transform coefficient. Judging on the normalized activity
- measured in the pre-analysis stage of video encoding (assuming you even have
- a pre-analysis stage), and the target buffer size status, you have a fairly
- good idea of how many bits can be spared for the target macroblock, for
- example.
-
- Other pre-filtering techniques mostly take into account: texture patterns,
- masking, edges, and motion activity. Many additional advanced techniques can
- be applied at different immediate layers of video encoding (picture, slice,
- macroblock, block, etc.).
-
-
- 68. What about post-processing enhancements?
-
- Some research has been carried out in this area. Non-linear interpolation
- methods have been published by Wu and Gersho (e.g. ICASSP 93), convex hull
- projections for MAP (Severinson, ICASSP 93), and others. Post-processing
- unfortunately defies the spirit of MPEG conformance. Decoders should produce
- similar reconstructions. Enhancements should ideally be done during the pre-
- processing and encoding stages.
-
- 69. Can motion vectors be used to measure object velocity?
-
- A. Motion vector information cannot be reliably used as a means of
- determining object velocity unless the encoder model specifically set
- out to do so. First, encoder models that optimize picture quality generate
- vectors that typically minimize prediction error and, consequently,
- the vectors often do not represent true object translation. Standards
- converters that resample one frame rate to another (as in NTSC to PAL)
- use different methods (motion vector field estimation, edge detection, et
- al) that are
- not concerned with optimizing ratios such as SNR vs bitrate. Secondly, motion
- vectors
- are not transmitted for all macroblocks anyway.
-
- 70. How do you code interlaced video with MPEG-1 syntax?
- A. Two methods can be applied to interlaced video that maintain
- syntactic compatibility with MPEG-1 (which was originally designed for
- progressive frames only). In the field concatenation method, the
- encoder model can carefully construct predictions and prediction errors
- that realize good compression but maintain field integrity (distinction
- between adjacent fields of opposite parity). Some pre-processing
- techniques can also be applied to the interlaced source video that
- would, e.g., lessen sharp vertical frequencies.
-
- This technique is not efficient of course. On the other hand, if the
- original source was progressive (e.g. film), then it is more trivial to
- convert the interlaced source to a progressive format before encoding.
- (MPEG-2 would then only offer superior performance through greater DC
- block precision, non-linear mquant, intra VLC, etc.) Reconstructed
- frames are re-interlaced in the decoder Display process.
-
- The second syntactically compatible method codes fields as separate pictures.
- This approach has been acknowledged not to work as well.
-
- 71. Is MPEG patented?
- A. Yes and no. Many encoding methods are patented. Approximately 11
- blocking patents, that is, patents that are general enough to be unavoidable
- in any implementation have been recently identified.
-
- A patent pool is being formed within MPEG where a single royalty fee would be
- split among the 31 patent-holding companies.
-
- 72. How many cable box alliances are there?
-
- A. Many. To start with:
-
- Scientific Atlanta (SA), Kaledia, and Motorola:
- SA will build the box, Motorola the chips, and Kaleida the
- O/S and user interface (using ScriptX of course).
-
- Silicon Graphics (SGI), Scientific Atlanta, and Toshiba
- For the Time Warner's Orlando trial, SGI will provide the
- RISC (MIPS R4000) and software, SA will do the box again,
- and Toshiba will provide the chips.
-
- General Instruments (GI) and Microsoft:
- GI will make the box and Intel will supply the special low-cost
- 386SL processor on which a 1MB flash EPROM executable core
- of Microsoft windows and DOS will run. Microsoft will develop the
- user interface.
-
- Hewlett Packard (HP):
- HP will manufacture and/or design low cost, open architecture set-top
- decoder boxes (not a part of the Eon wireless deal). The CPU will
- explicitly not use a 80x68 based processor.
-
-
- CLI and Philips:
- Compression Labs will provide the encoder technology and Philips
- will provide the decoder techology for an ADSL system whose
- transport structure will be put together by Broadband Technologies.
-
- ["These alliances subject to change at the whim of PR departments
- and market forces."]
-
- 73. Will there be an MPEG video tape format?
-
- A. Not exactly. A consortium of international companies are co-
- developing a consumer digital video 6 millimeter wide, metal particle
- tape format. Due to the initial high cost of MPEG encoders, a JPEG-like
- compression method will be used for inexpensive encoding of typical
- consumer source video (broadcast PAL, NTSC). The natural consequence of
- still image methods is less efficient use of bandwidth: 25 Mbit/sec for
- the same subjective real-time playback quality achieved at 6 Mbit/sec
- possible with MPEG-2. A second bit rate mode, 50 Mbit/sec, is
- designated for HDTV.
-
- Pre-coded digital video from, e.g., broadcast sources will be directly
- recorded to tape and "passed-through" as a coded bitstream to the video
- decompression box upon tape playback. Assuming if linear tape speed is
- to be proportional to bit rate, the recording time of a pre-compressed
- MPEG-2 program at the upper limit of 5 Mbit/sec for broadcast quality
- video, the recording time would be over 20 hours. Channel coding
- schemes (error correction, convolution coding, etc.), however, will
- most likely be optimized for the tape medium and therefore may differ
- from the channel methods for cable, terrestrial, and satellite. (A
- Zenith-Goldstar S-VHS based experiment did, however, directly record the
- 4-VSB broadcast baseband signal of the old Zenith/AT&T HDTV proposal).
-
- More specs: (Summarized from EE Times July 5, 1993 article)
-
- tape width: 6.35 mm
- Audio: two channel 48 KHz 16-bit audio, or 4 channel at 32 KHz at 12-bit
- Tape format: metal evaporated tape, 13.5 microns thick
-
- Cassette dimensions: (millimeters) Recording times:
- Size Width Height Depth 525/625 (25Mb/sec) HDTV (50 Mb/s)
- -------- ----- ------ ----- ------------------ --------------
- Standard 125 78 14.6 4h30min 2h15min
- Small 66 48 12.2 1 hour 30min
-
- Linear tape speeds: 18.812 mm/s (60Hz), 18.831 mm/s (50 Hz)
- Video compression: DCT based
-
- Participants: Matsushita, Sony, Philips, Thomson, Hitachi, Mitsubishi,
- Sanyo, Sharp, Toshiba, JVC.
-
- MPEG in everyday life
-
- 74. Where will be see MPEG in everyday life?
- A. Just about wherever you see video today.
-
- DBS (Direct Broadcast Satellite)
- The Hughes/USSB DBS service will use MPEG-2 video and audio. Thomson
- has exclusive rights to manufacture the decoding boxes for the first 18
- months of operation. Hughes/USSB DBS will begin its U.S. service in
- April 1994. Two satellites at 101 degrees West will share the power
- requirements of 120 Watts per 27 MHz transponder over a total of 32
- transponders. Multi source channel rate control methods will be
- employed to optimally allocate bits between several programs normalized
- to one 22 Mbit/sec data carrier. Bit allocation adapts to instantaneous co-
- channel
- spatial and co-channel temporal activity. An average of 150 channels are
- planned with the addition of a second set of satellites augmenting the power
- level of each transponder to 240 Watts. The coded throughput of each
- transponder will increase to 30 Mbit/sec.
-
-
- CATV (Cable Television)
- Despite conflicting options, the cable industry has more or less
- settled on MPEG-2 video. Audio is less than settled. For example,
- General Instruments (the largest U.S. consumer cable set-top box
- manufacturer) have announced the planned exclusive use of Dolby AC-3.
- The General Instruments DigiCipher I video syntax is similar to MPEG-2
- syntax, but employs smaller macroblock predictions and no B-frames. The
- DigiCipher II specification will include modes to support both the GI
- and full MPEG-2 Video Main Profile syntax. Digicipher-I services such
- as HBO will upgrade to DigiCipher II in 1994.
-
- HDTV
- The U.S. Grand Alliance, a consortium of companies that formerly competed
- to win the U.S. terrestrial HDTV standard, have already agreed to
- use the MPEG-2 Video and Systems syntax---including B-pictures. Both
- interlaced(1920 x 1080 x 30 Hz) and progressive (1280 x 720 x 60 Hz)
- modes will be supported. The Alliance has also settled upon a modulation
- method (VSB) convolution coding (Viterbi), and error correction (Reed-
- Soloman) specification.
-
- In September 1993, the consortium of 85 European companies signed an
- agreement to fund a project known Digital Video Broadcasting (DVB) which
- will develop a standard for cable and terrestrial transmission by the
- end of 1994. The scheme will use MPEG-2. This consortium has put the
- final nail in the coffin of the D-MAC scheme for gradual migration
- towards an all-digital, HDTV consumer transmission standard. The only
- remaining analog or digital-analog hybrid system left in the world is
- NHK's MUSE (which will probably be axed in a few years as soon as it appears
- to be politically secure thing to do).
-
- 75. What is the best compression ratio for MPEG ?
- A. The MPEG sweet spot is about 1.2 bits/pel Intra and .35 bits/pel
- inter. Experimentation has shown that intra frame coding with the
- familiar DCT-Quantization-Entropy hybrid algorithm achieves optimal
- performance at about an average of 1.2 bits/sample or about 6:1
- compression ratio. Below this point, artifacts become noticeable.
-
-
- 76. Is there a MPEG CD-ROM format?
- A. Yes, a consortium of international companies (Matsushita, Philips,
- Sony, JVC, et al) have agreed upon a specification for MPEG video and
- audio. 2 hour long movies are stored on two 650 MByte compact discs. The
- video
- rate is 1.15 Mbit/sec, the audio rate is either 128 kbit/sec or 192 kbit/sec
- Layer I or Layer II.(this seems to contradict the Philips 224 kbit/s audio
- spec?). Although the Video, Systems, and Audio syntax are identical, the CD-I
- movie format and the White Book format are not compatible.
-
- Researchers are busy experimenting with denser and faster rate CD
- formats, perhaps using green or blue laser wavelengths. One demonstration
- stretched the pit and track density to its limits, improving areal density by
- almost 2 fold.
-